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Abstract 

The possibility of detecting mutations in a DNA from force measurements 
(as a first step towards sequence analysis) is discussed theoretically based 
on exact calculations. The force signal is associated with the domain wall 
separating the zipped from the unzipped regions. We propose a comparison 
method ("differential force microscope") to detect mutations. Two lattice 
models are treated as specific examples. 
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The possibility of a force induced unzipping transition has opened up new ways 
of exploring the properties of biomolecules. Since the critical force for a double stranded 
DNA depends on its sequence, an inverse problem can be posed: "Can the DNA sequence 
be detected from the force required to unzip it ?" A still simpler question would be: "Can 
the mutations in a DNA be detected by force measurements?" . A positive answer to either 
of these questions would lead to the possibility of testing (and detecting) mutations and of 
sequence determination in a non-destructive way. 

Based on a few simple models used earlier for the DNA denaturation and unzipping 
transitions we show here the signature of mutations on the f-vs-r curve (force versus 

relative distance of the end points of the two strands). The inverse problem is then to get 
the position of the mutations from such an experimentally realizable curve. Our emphasis 
in this paper is on the one base pair mutation problem also known as point mutation. This 
is not just of academic interest. The replication of DNA is a high-fidelity process, thanks 
to the inbuilt proof-reading and repair mechanisms, so that mistakes are very rare though 
even one could play havoc. 

Our proposal (solution of the inverse problem), based on exact calculations, is to obtain 
the force difference ("differential force") between identically stretched native and mutant 
DNAs. This could be done by using e.g. two atomic force microscope tips: we name this 
apparatus a "differential force microscope". The position of the mutation can be obtained 
from a calibration curve involving the extremum position (or its value) of the differential 
force curve. In our models, we can find the nature of the mutation from the sign of the 
differential force. 

Let us model a double stranded (ds) DNA by two N-monomer polymers interacting at 
the same contour length (or monomer index) j of the strands through a contact attractive 
potential —€j{ej > 0), which might depend on the contour length. The interaction energy 
is H = —Y.f=iSrj,o^j, where denotes the relative distance of the j— th base pair and 
6 denotes the Kronecker delta. Other features like the self- and mutual-avoidance, base 
stacking energy, helicity, etc are ignored in this study (but see below) in order to focus 
on the base pairing energy. Such a model exhibits a denaturation transition at a model- 
dependent temperature T = from a low T double stranded configuration to a high T 
phase of two unpaired single strands. 

A few definitions: Starting with a sequence {ej\j = 1, . . . , A^} or bases of a DNA (to be 
called the native DNA), we define a mutant DNA as one with almost the same base sequence 
as the native molecule except a few base pairs. A homogeneous DNA or homo-DNA is a 
DNA with identical base pairs (i.e. ej = e for all j) while a heterogeneous DNA is one with 
heterogeneity in the sequence (j-dependent e^). Notice that the model we have defined does 
not consider base pair stacking interaction and therefore does not distinguish e.g. between 
an AT and a TA (or CG and GC) base pair. Consequently, the mutations we are referring 
to are only those involving AT (or TA) CG (or GC). 

Two specific examples are considered here because of their exact tractability (analytical 
and numerical): (Mi) a two dimensional d = 2 directed DNA model (with the strands 
directed along (1, 1)) with base-pair interaction and mutual avoidance (forbidden crossing 
of the strands), and (Mii) Two Gaussian polymers with the base-pair interaction in d- 
dimensions. In both cases, only the relative chain, involving the separation r of the bases at 
the same contour length, need be considered. These models in addition to the denaturation 
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transition also show, for T < T^, an unzipping transition in presence of a force at the free 
end (j = A^) In this paper, we consider the conjugate fixed distance ensemble and 

henceforth restrict ourselves to T < T^. Consequently, we put P = (ksT)'^ = 1 unless 
explicitly shown, where ks is the Boltzmann constant. 

For any ds DNA having their first monomers (j = 1) joined and their last monomers 
(j = A^) at a relative lattice distance r, the force fAr(r) required to maintain this relative 
distance r is fAr(r) = VJF(r) where J^(r) represents the free energy of the system in the 

fixed-r ensembleS. By definition, /q°° fAr(r) ■ dr = J-'{oo) — J-'{0) which gives the free energy 
of binding or the work required to unzip completely DNA. 

The two ensembles, fixed-force and fixed-stretch, are expected to give identical results in 
the N —>■ oo limit, though for finite A^ inequivalence might be expected (this is indeed the 
picture valid for the homo-DNA), but a more serious situation arises if, e.g. for the scalar 
case, df/dx < in the fixed-stretch ensemble because, in the fixed-force ensemble, 
= (x^) — (x)^ > 0, where (■) denotes thermal averaging. This is the case for a heterogeneous 
DNA as shown in Fig. 1. The regions of "wrong" sign (reminiscent of preshocks in Burgers 
turbulence ^ or of "slip" in Ref. p|) go away only after quenched averaging but do survive 
in the thermodynamic limit for each individual realization. The absence of self- averaging 
in the system is encouraging, because it implies that individual features, typical of a single 
realization of the sequence, are maintained in the characteristic curves. We draw attention to 
the overlap of the f vs x curves over a range of x in Fig. 1 for different lengths with identical 
sequence near the open end. This overlap is a sign that the modulation is a characteristic 
of the sequence, and we have found this to be true quite generally. 

Let us now consider the one mutation problem where the k-th pairing energy of the 
native DNA has been changed from to e'^.. For this one site change in H, the partition 
function Z'Ar|fc(r, A^) is given by 



where Ck = (exp (/3(e'^ — e^)) — 1) and Z^ijc^N) is the native DNA partition function with 
last monomers at a relative distance r, while Z^ir, A^ | 0, fc) is with the additional constraint 



ability that site k is zipped when the free ends are at a distance r. The force difference (to 
be called the differential force) to keep the free ends of both the mutant and the native DNA 
at the same distance r can be written as 



ZN\kir, N) = ZM{r, N) + CkZ^ir, N\0,k). 



(1) 




5fAf|fc(r) 



l + CkP{r,N\0,k) 
CkPir,N\0,k) 



VP{r,N\0,k) 



(2) 



l + CkP{r,N\0,k) 



(f^(r|0,A;)-f^(r)). 



(3) 



§In the continuum the force for the Gaussian interacting polymers satisfies a Burgers' type equa- 
tion. See, e.g., Ref. |^]. 
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Here fAr(r|0, A;) is a generalized force, the one necessary to keep the free ends of the 
native DNA at a relative distance r when the k—th monomers are zipped. Except c^, all 
other quantities in Eq. refer to the native DNA. This fact allows the inverse problem to be 
tackled as we show explicitly for a few cases. Since for DNA we have generally two possible 
choices for e, (ei and €2), the sign of Ck determines the sign of 6f in Eq. . Thus, in our 
simple model, the nature of the mutation can be identified from the sign of the differential 
force curve. Eqs. |l], can be generalized to more than one mutations and to models with 
other local energy parameters, though they become algebraically more involved. These more 
complicated cases will be discussed elsewhere. We consider the simplest case here. 

If r = (x, 0, . . . , 0) is the direction of stretching, the quantity P(r, A^|0, k) has a kink-like 
behavior (as in Fig. 0). In the fixed stretch ensemble, the chain separates into an unzipped 
and a zipped region separated by a domain wall, which to a very good approximation can 
be fit by a tanh function: 

^ Po (1 + tanhx) /2, x = [x - Xd{k)]/wd{k), (4) 

where Xd{k) and Wd{k) are the position and the width of the wall (kink), and Pq is a constant. 
Eq. I suggests that 6f{x) = Wd{k)^^{x)^ where f(x) is a scaling function. 

It is possible to extend the above analysis to the general case where more than one 
mutation is present. For example, the partition function for the case with two mutations at 
positions ki and k2, in an obvious notation, is 

ZNikukAr) = ^iv(r) + Ck,ZN{rN\Oki) + 

i=l,2 

Ck,Ck,ZNirN\0ki\0k2). (5) 

The last term of the above equation, representing correlation of the mutations, gives an 
additional contribution to the differential force over and above the individual contributions 
of the mutations. This additional contribution is negligible if the two mutations are far away 
or, more quantitatively, not in the same domain wall. 
We now use these general results for cases (Mi),(Mii). 

In the two dimensional model (Mi), the partition function for two directed chains hav- 
ing their last monomers at a relative lattice distance x (along (1, —1) and in unit of the 
elementary square diagonal), and their first monomers joined, can be written in terms of N 
mo nomer-to- monomer transfer matrices Wj (j = 1, . . . , A^) with matrix elements (x' | IVj- 1 x) = 
((exp (Pej) — 1) 63;' + 1) {26x'^x + Sx',x+i + Sx',x-i) \x), \x') being the position vectors (with 
the constraint x,x' > 0). 

For a homo-DNA with contact energy e, the largest eigenvalue of W determines the free 
energy and the thermodynamic properties in the limit A^ — 00. For T < = ^ log ^ ' 

the melting temperature, (3f{x) = cosh~"^(2^ — 1) with zq = a/X — X,X = (1 — e~^^). 
Indeed, in the fixed-stretch ensemble, any finite {x -C A^) stretch puts the chain on the 
phase coexistence curve. 

Exploiting the equivalence of the ensembles valid for for homo-DNA , we find (3fi^{x) = 
F{x/N), where 
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Fiy) = 2tanh~\max{y,yo}), (yo = (1 - Azq)-), (6) 

is a piecewise continuous non-analytic function. For finite A^, there is no singularity but the 
approximation of Eq. ^ still works quite well. At a; ~ yoN the force curve increases sharply 
(see also Fig.l, curves (a) and (c)). 

We now come to the explicit results for the one-mutation case where one e is replaced 
by e' < e. For a homo-DNA, by starting from Eq. ^, using Eqs. J^, one can find analytical 
approximations to the shapes of the previously introduced P{x, N\0, k) and 5f in terms of 
piecewise continuous functions. We find e.g. P{x, N\0, k) = P{0, N\0, k) exp {g{x))), where 
(if Nyo <k = N-k): 

g{x) = when x < kyo, (7) 
= 9kix) if kyo<x < Nyo (8) 
= 9~k{x) - gN{x) if Nyo <x <k (9) 



and gk{x) = log 



(1±1) 



x+k 



Eq. now simplifies because /Ar(x|0,fc) = fk{x). 



The characteristics of the differential force curve 6f vs x, such as the extremum value, its 
position and the width, (5/max(/s), Xf{k) and Wf{k), can be determined from Eqs. as 

SUUkM^fik)]'' ~ Xf{k) = kifik) (10) 

with x/(0) = 1 and \mii^^Xf{k) = yo, where yo is defined in Eq. ||. The area of the peak, 
which yields the difference (with respect to the native case) in the work necessary to unzip 
completely the molecule, is constant as expected. The scaling form introduced after Eq. 
^ suggests that the differential force is significant only in the domain wall region and the 
width of the domain wall Wd{k) ~ Wf{k) as we do see in the numerical results. 

For model (Mii), with one-dimensional Gaussian polymers (T^, = oo), P(x, A^|0, /c) has 
been calculated exactly by a transfer matrix method and is shown in Fig. 2a. The validity 
of Eq. |10| (5/max(^) ~ 'W^/(^)~^) is apparent from the data-collapse of the various 5f curves 
in Fig.2b where is plotted against [x — Xf{k))5f^a.^{k). The peak force difference, 

5 /'max(^) a function of the mutant position k is in accord with the k law of Eq. p!0| . 
The results for model (Mi) are shown in Fig. ^. For d > 1 (r = (x, 0, . . . , 0) as above) the 
situation is similar as, e.g., the data-collapse of Fig. ^ and Eq. |10| remain vahd. 

Coming to the case of one mutation on a heterogeneous DNA consisting of two energies 
ei and £2 > ei chosen with equal probability, the shapes of the zipping probability curves 
are found to be similar to the homo-DNA case, in fact indistinguishable on the plot of Fig. 
0, with x^{k) and w^i^k) sequence dependent. This indicates the validity of the domain wall 
interpretation even for a heterogeneous DNA. The mutation involves a change of the energy 
at site k (i.e. ci ^ 62). The signals 5f for various mutations are shown in Fig. ^. As already 
mentioned, in our models the sign of 5f tells us the nature of the mutation. These individual 
curves can again be collapsed on to a single one as for homo-DNA, though the nature of 
collapse is not as good, mainly because the area under the curve is no longer strictly a 
constant. This reflects the importance of local sequences around the mutation point. Figure 
3 (curves b, c) shows the fc-dependence of Xf{k). Unlike for homo-DNA, 5/max(fc) does not 
seem to have the simple form of curve (a) in Fig. 3. Although the linearity is maintained 
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for Xf{k) as for the homogeneous case, there are regions of nonmonotonicity at small scales 
which hamper the inversion. 

Fig. 3 gives a basis for a calibration curve. This could be Xf{k) or 5/max vs k (or both) 
for a homogeneous DNA, though for heterogeneous DNA, we find the Xf-vs-k curve to be 
more reliable. Given the value of Xf{k), one can look up in Fig.3 for the corresponding k. 
The accuracy of the method relies on the ability to resolve close-by mutation points, i.e. 
mutation points in the same domain wall. There are differences in the full profiles of the 
6f curves for mutations at, say, k, k + 1, k + 2, though translating that information back 
to the identification of the position is yet to be achieved. A better resolution is in any case 
obtained by changing the point at which the strands are pulled. 

We now argue on how our calculation can compare with a potential experiment. In 
the typical force arising in the unzipping is between 10 and 15 pN, while the resolution is 
set below 0.2 pN, so in percentage it is < 1.3 — 2%. Dynamical effects (important in [p!0| ) 
are almost negligible already at the lowest stretching velocity used in (20 nm/s) and 
are less important as the velocity is lowered. Our values for the typical force at T = 1 are at 
the border of this present day resolution (see Fig. 3, where it appears that the resolution is 
around 1% for k ~ 500, i.e. on the middle of the chains). In principle they can be improved 
above it by lowering T, though it is not clear to what extent this will work because the 



experimental temperature range is rather limited. In Ref. |]TT|, the authors suggest that 
there will be experimental difficulties which would hamper the acquisition of the base by 
base sequence of DNA by means of force measurements (but would however allow to get 
information over groups of ~ 10 bases). This difficulty, though absent in our exact analysis 
of the models, might also set a lower limit on the error on the position of the mutation. 
Summarizing, we prove that mutations are detectable in the theoretical models. Numbers 
coming from our models suggest that this measurement might be a benchmark for present 
day real technology. 

We finally propose an algorithm for sequencing DNA from the unzipping force in our 
models. This is defined so that the energy of the j— th base pair is €2 if the average force 
at stretch x = N — j is above the force signal of a homo-DNA with an attractive energy 
e = ^^^^ (T is low enough that yo ~ 1). Our algorithm differs from the discussion in p| 
because T and extra constraints (see below) play crucial roles. If we define the "score" of the 
algorithm as the fraction of base pairs correctly predicted, from our data we observe that for 
any finite-size sample the score is 100% for T < To(A^)~A^~'^, with < -0 < 1 for N 00. 
However, Nq monomers near the open end can be sequenced at T ~ To{No), no matter how 
big the total is. Once this is done, we restart this time keeping the corresponding bases 
at position Nq from the open end at a distance x with the constraint that the monomers at 
N are at a distance x' > Nq — x, to prevent rejoining in the already unzipped A^o monomers. 
In this way we would sequence another Nq monomers and so on. We have verified this for 
models (Mi) and (Mn). 

In conclusion, we studied the f vs. r characteristic curve in the fixed stretch ensemble 
for simple models of DNA focusing on the base pairing energy only. We have seen that for a 
homo-DNA, the force difference between a native and the corresponding one mutation case 
when pulled to the same distance contains enough signature to locate the position of the 
mutation. This could be the basis of a differential force microscope to detect mutations. 
For a heterogeneous DNA, the mutation point cannot be localized always as accurately as 
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for homo-DNA. Accuracy could be achieved by taking cognizance of the full features of the 
df curve. We have shown that the differential force curve can be understood as due to 
the domain wall separating the zipped and the unzipped phases as the strands are pulled 
apart. Moreover, we found (Fig. 1) that the modulations in the force curve are connected 
to the local sequence. This holds promise of extension of our proposal to cases beyond point 
mutation. 
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FIG. 1. The force vs stretching distance curves for heterogeneous DNAs (model (Mi)). The 
sequences are chosen randomly but both share the same sequence (1000 e's) from the open, pulled 
end. For the fixed stretch ensemble, curves (a) and (b), the pattern is identical over a region of x. 
Curve (c) is the fixed force ensemble phase coexistence curve with finite-size effect. The length of 
the unzipped part in units of base pairs is approximately x/yo (see Eq. |6|). 




FIG. 2. (a) The collapse of P{x, N\0, k)/ Pq vs. x where Xd(/c), t«d(fc) are obtained by fitting 
Eq. ^ (solid curve). For clarity only three cases of k are shown, (b) The collapse plot of 5f/5fraB.^{k) 
vs (x — Xf{k))6fma.xik)- A similar collapse is found even with ^^(A;) and t«d(^) of (a). These are 
for model (Mii), homo-DNA, N = 5000 , d = 1 and /3e = 2/3e' = 1.5. Arrows point towards the 
relevant axes. 
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FIG. 3. The "calibration" curves 5fmax{k) and Xf{k) for (a,b) a homogeneous DNA and (c) 
a heterogeneous DNA (only Xf{k) is shown). The curves fitting the data according to Eq. |lO| are 
shown. Parameters are as in Fig. 1. 
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FIG. 4. (a) The 6f vs x curve for a heterogeneous DNA. The sign of 6f gives the nature of 
the mutation, (b) The cohapse plot of the curves of (a) as in Fig. ^.The plots are for model (Mi), 
N = 1000 and €2 = 2ei = 1 (with T = 1). 
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